Method in pipelined data processing

ABSTRACT

A method in a processor is presented, in which data is processed in a pipelined manner, the data being included in a plurality of contexts, comprising a first ( 3 ), in addition to which a plurality of operations is adapted to be executed on the contexts. The method comprises executing an initial operation step ( 6   a ) of a first operation on the first context ( 3 ), and subsequently commencing an execution of an initial operation step ( 7   a ) of a second operation on the first context before an execution on the first context ( 3 ) of a following operation step ( 6   b ) of the first operation is completed.

TECHNICAL FIELD

The invention concerns a method in a processor, in which data isprocessed in a pipelined manner, the data being included in a pluralityof contexts, comprising a first context, in addition to which aplurality of operations is adapted to be executed on the contexts.

BACKGROUND

In data processing the rate of the data process is an important factorfor an efficient processor. A way to allow a high rate of data passingthrough a processor is to perform pipelined processing of the data, i.e.allowing the process to be executed on one set of data before theprocess of previous sets of data are finalized. Such processes aretypically carried out in a number of stages at which operations areexecuted on contexts including sets of data.

SUMMARY OF THE INVENTION

It is an object of the present invention to increase the rate of databeing processed in a data processor.

This object is reached by a method of the type mentioned initially,characterized in that the method comprises commencing an execution onthe first context of a second operation before a previously commencedexecution on the first context of a first operation is completed. Thismeans that in addition to pipelined processing of the data stream, theexecution of each operation is pipelined. Thus, a form of “twodimensional” pipelining is achieved. On one hand the data stream ispipelined, and on the other hand, the “operation stream” is pipelined.

The object is also reached with a method of the type describedinitially, including executing an initial operation step of a firstoperation on the first context, and subsequently commencing an executionon the first context of an initial operation step of a second operationbefore an execution on the first context of a following operation stepof the first operation is completed. Preferably, each context passes aplurality of consecutive stages, whereby the initial operation step ofthe first operation is executed on the first context at a first stage,the following operation step of the first operation is executed on thefirst context at a second stage, and the initial operation step of thesecond operation is executed on the first context at the second stage.

Thus, the operation pipelining can be done in a number of steps in thepipelined data processor. If each operation is performed in N steps, theprocessor can run in a higher frequency, i.e. N times higher than in acase with no operation pipelining. This means that during a certain timeinterval more instructions can be executed. This can provide for ahigher data bandwidth throughput in the data stream pipeline.

Preferably, the method comprises receiving at the second stage a resultof an execution of the initial operation step of the first operation.This provides for commencing the execution of an operation at one stage,and continuing the execution at a consecutive stage.

Preferably, where the first operation comprises a partial operation ofexecuting an instruction and a partial operation of writing a result ofthe said instruction execution into a destination in a register, and thesecond operation comprises the partial operation of fetching an operand,the method can comprise the following steps: Determining if a positionin the register, from which the operand is to be fetched in the secondoperation, is identical with the destination of the partial operation,of the first operation, of writing a result. If the result of the stepof determining is negative, fetching the operand from the register. Ifthe result of the step of determining is positive, fetching the resultof the said instruction execution. Thereby it is possible to initiate anoperation before an operation initiated previously on the same context,without having to wait for the previously initiated operation to becompleted. This will facilitate increasing the data processing rate ofthe processor.

BRIEF DESCRIPTION OF FIGURES

Below, the invention will be described in detail with reference to thedrawings, in which

FIGS. 1, 3, 4 and 6 show block diagrams of processes in pipelined dataprocessors,

FIG. 2 shows a table correlating objects in the process of FIG. 1, and

FIG. 5 shows a schematic example of a program code for functionsdepicted in FIG. 4.

DETAILED DESCRIPTION

FIG. 1 depicts schematically a data processing pipeline 1, in a dataprocessor. The pipeline comprises a plurality of stages 2 a-2 f. For thepurpose of this presentation only six stages are shown, whereas anynumber of stages could be used in the pipeline, and in reality aconsiderably larger number of stages would be used.

At each clock cycle of the data processor, a context 3, including datato be processed, is received at each stage from the preceding stage,i.e. the stage immediately to the left in FIG. 1. In FIG. 1 each stage 2a-2 f is depicted at a time corresponding to one clock cycle after thepreceding stage, which is indicated by the time axis T. Thus, FIG. 1shows any stage presented at a time occurring after the time ofpresentation of the preceding stage, the difference in timecorresponding to the duration of each clock cycle, Δt. Hence, thecontext 3 in any stage in FIG. 1 corresponds to the context in any otherstage in FIG. 1. Of course, at the same point in time the stages havedifferent contexts, which is a feature that makes the data processingpipelined.

Each context 3 includes a packet, with data, which may be processed bythe processor, a register file, flags, and an instruction counter orinstruction pointer, e.g. a row instruction pointer, as described in theinternational patent applications PCT/SE01/01134 and PCT/SE01/01133,each with the applicant of this application and included herein byreference.

In the data processing pipeline a number of operations are performed inconnection to each context. Each operation consists of a number ofpartial operations. In a plurality of stages 2 a-2 e, during each clockcycle, partial operations are performed in connection to the context 3.Each operation can comprise executing in a logic unit 5 a-5 e in therespective stage an instruction stored in an instruction memory, notshown in FIG. 1. The logic unit could comprise an ALU.

Each operation comprises a number of steps, each comprising one or moreof the partial operations. In pipelined data processing, an operationtypically comprises the partial operations: (i) instruction fetch, (ii)instruction decoding, (iii) operand fetch, (iv) execution, (v) branch,and (vi) write back. These partial operations could be allocated to anynumber of operation steps. Suitably, an operation could contain twosteps, the first containing partial operations (i) to (iii) and thesecond containing partial operations (iv) to (vi).

For a better understanding of the concept of the invention, eachoperation in this example comprises three operation steps, for thispresentation referred to as an initial operation step 6 a, 7 a, 8 a, anintermediate operation step 6 b, 7 b, 8 b, and a final operation step 6c, 7 c, 8 c. The intermediate and the final operations step are alsoreferred to as “following operation step”.

In general, each operation can comprise any number of operation steps.It should also be kept in mind that the context can alternatively bereceived by a stage of the processing pipeline without any partialoperations being performed on it, or on parts of it.

A first context 3; which could be any context in the processor, isreceived at a first stage 2 a, which could be any stage in theprocessor. In the first stage 2 a, a first initial operation step 6 a ofa first operation is performed on the first context 3. A first initialoperation step result R6 a is generated as a result of the first initialoperation step 6 a being performed on the first context 3.

Subsequently, in a second stage 2 b, the first context 3, modified bythe first initial operation step 6 a, is received from the first stage 2a. The modified first context 3 comprises the first initial operationstep result R6 a. It should be noted that the pipeline is adapted sothat when a context is received in a stage from a previous stage, theprevious stage receives another context, as described in the abovereferenced international patent application PCT/SE01/01134.

In a pipelined manner, essentially simultaneously with the first context3 being received in the second stage 2 b, a second context, not shown,is received at the first stage 2 a.

In the second stage 2 b, a first intermediate operation step 6 b of thefirst operation is performed on the first context 3, based on the firstinitial operation step result R6 a. As a result a first intermediateoperation step result R6 b is generated.

During the same clock cycle, at t+Δt, the initial operation step 6 a ofthe first operation is executed on the second context. Thus, the initialoperation step 6 a of the first operation is executed on the secondcontext before the execution on the first context 3 of the followingoperation step 6 b of the first operation is completed. In other words,an execution on the second context of a first operation is commencedbefore a previously commenced execution on the first context of thefirst operation is completed.

Also, in the second stage 2 b, a second initial operation step 7 a of asecond operation is performed on the first context 3, and a secondinitial operation step result R7 a is generated as a result thereof.

Subsequently, in a third stage 2 c, the modified first context 3 isreceived from the second stage 2 b. Thereby, the first context 3comprises the second initial operation step result R7 a and the firstintermediate operation step result R6 b.

In the third stage 2 c, a first final operation step 6 c of the firstoperation is performed on the first context 3, based on the firstintermediate operation step result R6 b. Since, in this example, eachoperation consists of three operation steps, by the first finaloperation step 6 c, the partial operations of the first operation on thefirst context 3 are completed.

Also, in the third stage 2 c, a second intermediate operation step 7 bof the second operation is performed on the first context 3, based onthe second initial operation step result R7 a. A second intermediateoperation step result R7 b is generated as a result thereof.

Also, in the third stage 2 c, a third initial operation step 8 a of athird operation is performed on the first context 3, and a third initialoperation step result R8 a is generated as a result thereof.

FIG. 2 shows a table in which the first two columns correlate operationsteps and stages in the example in FIG. 1. It can easily be understoodthat, since different steps of each operation are carried out inseparate stages of the processor pipeline, a person programming theinstruction memory will be faced with a task that can seem complicatedin cases where there are a lot of stages in the pipeline. Therefore, theprocessor is arranged so that all steps, 6 a-6 c, 7 a-7 c, etc, of eachoperation are presented to a programmer as being carried out in the samestage, 2 a, 2 b, etc, of the pipeline, see the third column in the tablein FIG. 2. This will facilitate the job of the programmer since it keepsthe programming of the instruction memory clear and well-arranged. Thus,the true correlation of operation steps and stages of the processingpipeline will not be visible to the programmer.

Referring again to FIG. 1, in this example, the third stage is the laststage at which an operation is initiated. A fourth, fifth and sixthstage 2 d, 2 e, 2 f are located at the end of the pipeline. Since allsteps of the second and third operation appears to a programmer of theprocessor to be executed in the second and third stage 2 b, 2 c,respectively, the stages following the third stage 2 c will be invisibleto the programmer.

In the fourth stage 2 d, the modified first context 3 is received fromthe third stage 2 c, whereby it comprises the third initial operationstep result R8 a and the second intermediate operation step result R7 b.A second final operation step 7 c is performed, based on the secondintermediate operation step result R7 b. Thereby, the partial operationsof the second operation are completed. A third intermediate operationstep 8 b is performed, based on the third initial operation step resultR8 a, resulting in a third intermediate operation step result R8 b.

Subsequently, in the fifth stage 2 e, the modified first context 3 isreceived from the fourth stage 2 d, whereby it comprises the thirdintermediate operation step result R8 b, based on which a third finaloperation step 8 c is performed. Thereby, the partial operations of thethird operation are completed. In the sixth stage 2 f the context 3 isreceived after completion of partial operations of three operations.

Usually, in a data processing pipeline the execution of an operation isdependent upon the result of a previous execution of another operation.According to a preferred embodiment of the invention, multiple branchexecutions of operations or operation steps are performed to facilitatecommencing execution of subsequent operations before previouslyinitiated operations have been completed.

For an example of multiple branch execution, FIG. 3 depictsschematically a data processing pipeline 1, similar to the pipeline inFIG. 1. For this presentation the pipeline comprises only five stages 2a-2 e.

As in the example presented with reference to FIG. 1, partial operationsincluding the execution of instructions stored in an instruction memory,not shown in FIG. 3, are performed on a context 3 by logic units 5 a-5 din the stages. As in the example above, each operation comprises threeoperation steps: an initial operation step 6 a, 7 a, an intermediateoperation step 6 b, 7 b, and a final operation step 6 c, 7 c. In thisexample, only two operations are executed.

A first context 3 is received at a first stage 2 a, where a firstinitial operation step 6 a of a first operation is performed. A firstinitial operation step result R6 a is generated as a result of the firstinitial operation step 6 a being performed on the first context 3.

Subsequently, in a second stage 2 b, the first context 3, comprising thefirst initial operation step result R6 a, is received from the firststage 2 a. In the second stage 2 b, a first intermediate operation step6 b of the first operation is performed on the first context 3, based onthe first initial operation step result R6 a. As a result a firstintermediate operation step result R6 b is generated.

In this example, the execution of a second operation is dependent uponthe final result of the first operation. We assume that there are twoexecution paths of the second operation, both of which are initiated ina second initial operation step 7 a of the second operation. Since thesecond initial operation step 7 a is carried out in stage 2 b, before afirst final operation step of the first operation has been executed,both execution paths of the second initial operation step 7 a arecarried out, resulting in two alternative second initial operation stepresults, R7 a 1, R7 a 2.

In a real utilization of the invention more than two execution paths arepossible in an operation, whereby all paths may have to be executed orat least initiated before a subsequent operation is initiated.

Subsequently, in a third stage 2 c, the modified first context 3 isreceived from the second stage 2 b. Thereby, the first context 3comprises the two alternative second initial operation step results, R7a 1, R7 a 2, and the first intermediate operation step result R6 b.

In the third stage 2 c, a first final operation step 6 c of the firstoperation is performed on the first context 3, based on the firstintermediate operation step result R6 b, whereby the partial operationsof the first operation on the first context 3 are completed. Thereby, afirst operation result, R6, is generated.

Also, in the third stage 2 c, two second intermediate operation steps 7b of the second operation are performed on the first context 3, eachbased on one of the two alternative second initial operation stepresults, R7 a 1, R7 a 2. One second intermediate operation step result,R7 b 1, is generated as a result of the second intermediate operationsteps 7 b being performed on the basis of one of the two alternativesecond initial operation step results, R7 a 1. Another secondintermediate operation step result, R7 b 2, is generated as a result ofthe second intermediate operation steps 7 b being performed on the basisof the other of the two alternative second initial operation stepresults, R7 a 2.

In a fourth stage 2 d, the modified first context 3 is received from thethird stage 2 c, whereby it comprises the first operation result, R6,and both second intermediate operation step results R7 b 1, R7 b 2.Based on the first operation result, R6, it is determined whether asecond final operation step 7 c should be carried out based on one orthe other of the second intermediate operation step results R7 b 1, R7 b2. When this is determined, the second final operation step 7 c isperformed, based on whichever of the two second intermediate operationstep results R7 b 1, R7 b 2, that was determined to form a base of thesecond final operation step 7 c. Thereby, the partial operations of thesecond operation are completed.

A number of alternatives of multiple branch execution are possible. Forexample, different numbers of execution paths could be performed atdifferent steps of the same operation. Referring to the example in FIG.3, a plurality of execution paths could be performed based on eachinitial operation step result R7 a 1, R7 a 2, in the intermediateoperation step 7 b, resulting in more than two intermediate operationstep results.

Alternatively, only one execution path could be performed in the initialoperation step 7 a, upon which two or more execution paths of thefollowing, or intermediate, operation step 7 b are performed.

In general, according to a preferred embodiment of the invention, whereat least one of the operation steps of an operation comprises at leasttwo alternative execution paths, at least two of the alternativeexecution paths of the operation step can be executed at a stage of theprocessing pipeline. Thereby, results are obtained of at least two ofthe executions of the alternative execution paths. Based on a result ofan execution of another operation initiated before the initiation of thesaid operation, it is determined which one of the results of theexecutions of the alternative execution paths, an execution of anoperation step, following said operation step comprising at least twoalternative execution paths, is to be based on.

Multiple branch execution, as described above allows for the executionof an operation to commence, in spite of this execution being dependenton the result of a previously commenced execution of another operation,and the latter execution not being finalized.

For a further example of multiple branch execution we refer to FIGS. 4and 5. For simplicity the example contains a data processing pipelinewith only four stages as depicted in FIG. 4. FIG. 5 shows a program codefor the pipeline depicted in FIG. 4.

In a first stage 2 a a context 3 is received. In this stage 2 a twopartial operations of a first operation are performed regarding thecontext 3. One of these partial operations is a fetch partial operation6F, including fetching an instruction in an instruction memory, (notshown). The other partial operation is a decode partial operation 6D,including decoding of the fetched operation. In FIG. 5 these partialoperations are executed according to rows 1 and 2.

In a second stage 2 b, an execute partial operation 6E of the firstoperation is performed on data related to the context 3 according to theinstruction fetched and decoded in the first stage 2 a. Also, in thesecond stage 2 b a second operation is commenced regarding the context3.

The instruction executed in the execute partial operation 6E is aconditional jump, the jump depending on a value of a parameter x. InFIG. 5 it can be seen on row 3 that if x=0, the program is continued onrow L. However, the result of the execute partial operation 6E will notbe known until the end of the clock cycle t+Δt, (see FIG. 4), duringwhich the context is in the second stage 2 b. Therefore, referring toFIG. 5, when the second operation is commenced it will not be known ifthe program will be on row L+1 or row 4, since it will not be knownwhether the execution of the instruction regarding the previouslycommenced operation will cause the program to jump or not. Therefore, amultiple branch execution of the second operation is performed,involving two fetch partial operations 7F1, 7F2 of the second operation.Referring to FIG. 5, one of these fetch partial operations is performedaccording to row 4 and the other is performed according to row L+1 inthe program. Similarly, two decode partial operations 7D1, 7D2 areperformed, one according to row 5 and the other according to row L+2 inthe program. In a third stage 2 c of the data processing pipeline, thecontext 3 is received and a store partial operation 6S of the firstoperation is performed, which, depending on whether or not a jump wasperformed in the preceding stage 2 b, is performed according to row 6 orrow L+3 in the program, (see FIG. 5).

Since the result of the execute partial operation 6E is known, it can bedetermined which one of the instructions fetched and decoded in thesecond stage 2 b should be executed in the third stage 2 c. If no jumpwas made as a result of the execute partial operation 6E in the secondstage 2 b, the instruction fetched and decoded according to program rows4 and 5 will be used in an execute partial operation 7E in the thirdstage 2 c according to row 7 in the program. If a jump was made as aresult of the execute partial operation 6E in the second stage 2 b, anexecute partial operation 7E will be performed using the instructionfetched and decoded in the second stage 2 b according to program rowsL+1 and L+2, (see FIG. 5).

The multiple branch execution will require some additional hardware inthe processor, since one or more partial operation is executed accordingto more than one part of the program simultaneously. However, intraditional methods the need to include in the program no operationcommands results in a lower performance of the processor. With multiplebranch execution no operation commands can be avoided and a highperformance can be obtained.

Alternatively, or in combination with multiple branch execution, aprocedure, herein referred to as operand forwarding, can be used. Toillustrate this procedure we refer to FIG. 6, in which a data processingpipeline with operation pipelining is illustrated.

In the pipeline in FIG. 6 operations containing two steps each areperformed on a context 3. For the sake of clarity of this presentation,the pipeline contains only three stages 2 a, 2 b, 2 c, of which a finalstage is not visible to a programmer of the pipeline, as explained abovewith reference to FIG. 1.

In a first stage 2 a an initial step of a first operation is executed ina clock cycle at a time t. The initial step includes a first instructionfetch partial operation 6 a 1, which is a partial operation of fetchingan instruction from an instruction memory 21 a, and a first operandfetch partial operation 6 a 2, which is a partial operation of fetchingat least one operand from the context 3. In this example, the operationalso contains an instruction decoding partial operation, i.e. a partialoperation, not depicted in FIG. 6, of decoding the instruction from theinstruction memory. The partial operations of the initial step of thefirst operation result in first initial step results, i.e. aninstruction R6 a 1 and operand R6 a 2.

In a subsequent clock cycle at a time t+Δt, in a second stage 2 b afinal step of the first operation is executed. The final step includes afirst execute partial operation 6 c 1, which is a partial operation ofexecuting the instruction R6 a 1 on the operands R6 a 2 in a logic unit5 b, and a first write back partial operation 6 c 2, which is a partialoperation of writing back to the context 3 a result of the execution inthe logic unit 5 b. (Each operation can also contain a branch partialoperation, which is a partial operation, not depicted in FIG. 6, forupgrading a pointer in the context, which pointer is used for fetchingan instruction.)

In the same clock cycle at a time t+Δt, in the second stage, an initialstep of a second operation is also executed. This step includes a secondinstruction fetch partial operation 7 a 1. The initial step of thesecond operation also includes a second operand fetch partial operation.According to the procedure of operand forwarding, it is determinedwhether a position in a register in the context, from which an operandis to be fetched in the second operand fetch partial operation, isidentical with a destination of the first write back partial operation 6c 2. If any register position, from which an operand is to be fetched inthe second operand fetch partial operation, is not identical with thedestination of the first write back partial operation 6 c 2, the secondoperand fetch partial operation includes fetching 7 a 21 the operandsfrom the context 3. However, if any register position, from which anoperand is to be fetched in the second operand fetch partial operation,is identical with the destination of the first write back partialoperation 6 c 2, the second operand fetch partial operation includesfetching 7 a 22 the object of the first write back partial operation 82a. This means fetching the result of the execution 6 c 1 of theinstruction R6 a 1 on the operands R6 a 2 in the logic unit 5 b. Thus,the result of the execution of the instruction is “stolen” before thefirst operation is completed.

In short, where an instruction is to use a result of a precedinginstruction, and fetches a value in a register to which the precedinginstruction will enter a new value, an incorrect value will be obtained.Instead the result is fetched from another location, e.g. a temporaryregister or an ALU-result, i.e. directly in connection to an executionin the preceding instruction.

Referring to FIG. 6, in the second stage 2 b, the partial operations ofthe initial step of the second operation result in second initial stepresults, i.e. an instruction R7 a 1 and operand R7 a 2. In a third stage2 c, in a subsequent clock cycle at a time t+2Δt, a final step of thesecond operation is executed. The final step includes a second executepartial operation 7 c 1 and a second write back partial operation 7 c 2.

1. A method in a processor, in which data is processed in a pipelinedmanner, the data being included in a plurality of contexts, comprising afirst context (3), each context passing a plurality of consecutivestages (2 a-2 f), in addition to which a plurality of operations isadapted to be executed on the contexts, each operation comprising aplurality of consecutive operation steps and the consecutive operationsteps of one operation being executed on a context at least twodifferent consecutive stages (2 a-2 f), the method comprising: at afirst stage (2 a), executing an initial operation step (6 a) of a firstoperation on the first context (3), and at a second stage (2 b) thatconsecutively follows the first stage (2 a), subsequently commencing anexecution on the first context of an initial operation step (7 a) of asecond operation before an execution on the first context (3) of afollowing operation step (6 b) of the first operation is completed,wherein, at each clock cycle of the processor, the first context (3) isreceived at one of the stages from the preceding stage, the firstcontext is unconditionally moved to a next stage and a subsequentcontext of a subsequent operation is received at the first stage (2 a).2. A method according to claim 1, comprising commencing at the firststage (2 a) an execution of the initial operation step (6 a) of thefirst operation on a second context before the execution on the firstcontext (3) of the following operation step (6 b) of the first operationis completed.
 3. A method according to claim 1, comprising receiving atthe second stage a result (R6 a) of an execution of the initialoperation step (6 a) of the first operation.
 4. A method according toclaim 1, whereby at least one of the operation steps of the secondoperation comprises at least two alternative execution paths, and atleast two of the alternative execution paths of the operation step areexecuted.
 5. A method according to claim 4, further comprising:obtaining results (R7 b 1, R7 b 2) of at least two of the executions ofthe alternative execution paths, and determining, based on a result (R6)of an execution of an operation step of an operation initiated beforethe initiation of the second operation, which one of the results (R7 b1, R7 b 2), of the executions of the alternative execution paths, anexecution of an operation step of the second operation, following saidoperation step comprising at least two alternative execution paths, isto be based on.
 6. A method according to claim 1, whereby the processoris arranged so that the following operation step (6 b) of the firstoperation is presented to a programmer as being executed at the firststage (2 a).
 7. A method according to claim 1, wherein the firstoperation comprises a partial operation of executing (6 c 1) aninstruction and a partial operation of writing (6 c 2) a result of thesaid instruction execution into a destination in a register, and thesecond operation comprises the partial operation of fetching (7 a 2 I, 7a 22) an operand, the method comprising (a) determining if a position inthe register, from which the operand is to be fetched (7 a 2 1, 7 a 22)in the second operation, is identical with the destination of thepartial operation, of the first operation, of writing (6 c 2) a result,(b) if the result of the determination in step (a) is negative, fetching(7 a 21) the operand from the register, and (c) if the result of thedetermination in step (a) is positive, fetching (7 a 22) the result ofthe said instruction execution.